2 research outputs found

    DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention

    Full text link
    With the rise in manipulated media, deepfake detection has become an imperative task for preserving the authenticity of digital content. In this paper, we present a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks. Our model capitalizes on lip synchronization with input audio through a cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16 network. Subsequently, a transformer encoder network is employed to perform facial self-attention. We conduct multiple ablation studies highlighting different strengths of our approach. Our multi-modal methodology outperforms state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and per-video AUC scores

    Automatic Numerical Methods for Enhancement of Blurred Text-Images via Optimization and Nonlinear Diffusion

    Get PDF
    In this paper, we propose an automatic numerical method for solving a nonlinear partialdifferential- equation (PDE) based image-processing model. The Perona-Malik diffusion equation (PME) accounts for both forward and backward diffusion regimes so as to perform simultaneous denoising and deblurring depending on the value of the gradient. One of the limitations of this equation is that a large value of the gradient for backward diffusion can lead to singularity formation or staircasing. Guidotti-Kim-Lambers (GKL) came up with a bound for backward diffusion to prevent staircasing, where the backward diffusion is only limited to a specific range beyond which backward diffusion is stopped and forward diffusion begins. Our model combines the PME model and GKL model for automatic sharpening of blurred text-images using Nelder-Mead optimization, a derivative free optimization method that uses n+1 test points arranged as a simplex for n-dimensional optimization. We solve our model by discretizing the PDE in space using finite difference approximation scheme. Then, we enhance the image in each iteration using Backward Euler time-stepping and Minimum Residual Method (MINRES) in MATLAB. Likewise, we propose a gradientbased sharpness metric for our text-images, which also serves as an objective function for our Nelder-Mead optimizer. Our result shows that our proposed model is accurate in enhancing text images and predicting the unknown value of the blurring kernel for automatic sharpening. Numerical results show that the proposed objective sharpness measure coincide with the subjective sharpness of the enhanced image
    corecore